Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 548
Filtrar
1.
iScience ; 27(5): 109570, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38646172

RESUMO

The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.

2.
bioRxiv ; 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38562832

RESUMO

Genome-wide association studies (GWAS) and expression analyses implicate noncoding regulatory regions as harboring risk factors for psychiatric disease, but functional characterization of these regions remains limited. We performed capture STARR-sequencing of over 78,000 candidate regions to identify active enhancers in primary human neural progenitor cells (phNPCs). We selected candidate regions by integrating data from NPCs, prefrontal cortex, developmental timepoints, and GWAS. Over 8,000 regions demonstrated enhancer activity in the phNPCs, and we linked these regions to over 2,200 predicted target genes. These genes are involved in neuronal and psychiatric disease-associated pathways, including dopaminergic synapse, axon guidance, and schizophrenia. We functionally validated a subset of these enhancers using mutation STARR-sequencing and CRISPR deletions, demonstrating the effects of genetic variation on enhancer activity and enhancer deletion on gene expression. Overall, we identified thousands of highly active enhancers and functionally validated a subset of these enhancers, improving our understanding of regulatory networks underlying brain function and disease.

3.
bioRxiv ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38562714

RESUMO

Precision of transcription is critical because transcriptional dysregulation is disease causing. Traditional methods of transcriptional profiling are inadequate to elucidate the full spectrum of the transcriptome, particularly for longer and less abundant mRNAs. SHANK3 is one of the most common autism causative genes. Twenty-four Shank3 mutant animal lines have been developed for autism modeling. However, their preclinical validity has been questioned due to incomplete Shank3 transcript structure. We applied an integrative approach combining cDNA-capture and long-read sequencing to profile the SHANK3 transcriptome in human and mice. We unexpectedly discovered an extremely complex SHANK3 transcriptome. Specific SHANK3 transcripts were altered in Shank3 mutant mice and postmortem brains tissues from individuals with ASD. The enhanced SHANK3 transcriptome significantly improved the detection rate for potential deleterious variants from genomics studies of neuropsychiatric disorders. Our findings suggest the stochastic transcription of genome associated with SHANK family genes.

4.
J R Soc Interface ; 21(212): 20230647, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38503341

RESUMO

Cultural processes of change bear many resemblances to biological evolution. The underlying units of non-biological evolution have, however, remained elusive, especially in the domain of music. Here, we introduce a general framework to jointly identify underlying units and their associated evolutionary processes. We model musical styles and principles of organization in dimensions such as harmony and form as following an evolutionary process. Furthermore, we propose that such processes can be identified by extracting latent evolutionary signatures from musical corpora, analogously to identifying mutational signatures in genomics. These signatures provide a latent embedding for each song or musical piece. We develop a deep generative architecture for our model, which can be viewed as a type of variational autoencoder with an evolutionary prior constraining the latent space; specifically, the embeddings for each song are tied together via an energy-based prior, which encourages songs close in evolutionary space to share similar representations. As illustration, we analyse songs from the McGill Billboard dataset. We find frequent chord transitions and formal repetition schemes and identify latent evolutionary signatures related to these features. Finally, we show that the latent evolutionary representations learned by our model outperform non-evolutionary representations in such tasks as period and genre prediction.


Assuntos
Evolução Cultural , Música , Genômica
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493342

RESUMO

Dynamic compartmentalization of eukaryotic DNA into active and repressed states enables diverse transcriptional programs to arise from a single genetic blueprint, whereas its dysregulation can be strongly linked to a broad spectrum of diseases. While single-cell Hi-C experiments allow for chromosome conformation profiling across many cells, they are still expensive and not widely available for most labs. Here, we propose an alternate approach, scENCORE, to computationally reconstruct chromatin compartments from the more affordable and widely accessible single-cell epigenetic data. First, scENCORE constructs a long-range epigenetic correlation graph to mimic chromatin interaction frequencies, where nodes and edges represent genome bins and their correlations. Then, it learns the node embeddings to cluster genome regions into A/B compartments and aligns different graphs to quantify chromatin conformation changes across conditions. Benchmarking using cell-type-matched Hi-C experiments demonstrates that scENCORE can robustly reconstruct A/B compartments in a cell-type-specific manner. Furthermore, our chromatin confirmation switching studies highlight substantial compartment-switching events that may introduce substantial regulatory and transcriptional changes in psychiatric disease. In summary, scENCORE allows accurate and cost-effective A/B compartment reconstruction to delineate higher-order chromatin structure heterogeneity in complex tissues.


Assuntos
Cromatina , Cromossomos , Cromatina/genética , DNA , Conformação Molecular , Epigênese Genética
6.
bioRxiv ; 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38370833

RESUMO

Spatial transcriptomics has emerged as a powerful tool for dissecting spatial cellular heterogeneity but as of today is largely limited to gene expression analysis. Yet, the life of RNA molecules is multifaceted and dynamic, requiring spatial profiling of different RNA species throughout the life cycle to delve into the intricate RNA biology in complex tissues. Human disease-relevant tissues are commonly preserved as formalin-fixed and paraffin-embedded (FFPE) blocks, representing an important resource for human tissue specimens. The capability to spatially explore RNA biology in FFPE tissues holds transformative potential for human biology research and clinical histopathology. Here, we present Patho-DBiT combining in situ polyadenylation and deterministic barcoding for spatial full coverage transcriptome sequencing, tailored for probing the diverse landscape of RNA species even in clinically archived FFPE samples. It permits spatial co-profiling of gene expression and RNA processing, unveiling region-specific splicing isoforms, and high-sensitivity transcriptomic mapping of clinical tumor FFPE tissues stored for five years. Furthermore, genome-wide single nucleotide RNA variants can be captured to distinguish different malignant clones from non-malignant cells in human lymphomas. Patho-DBiT also maps microRNA-mRNA regulatory networks and RNA splicing dynamics, decoding their roles in spatial tumorigenesis trajectory. High resolution Patho-DBiT at the cellular level reveals a spatial neighborhood and traces the spatiotemporal kinetics driving tumor progression. Patho-DBiT stands poised as a valuable platform to unravel rich RNA biology in FFPE tissues to study human tissue biology and aid in clinical pathology evaluation.

7.
bioRxiv ; 2024 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-38293065

RESUMO

A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the modERN (model organism Encyclopedia of Regulatory Networks) consortium that systematically assayed TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). We describe key features of these datasets, comprising 604 TFs identifying 3.6M sites in the fly and 350 TFs identifying 0.9 M sites in the worm. Applying a machine learning model to these data identifies sets of TFs with a prominent role in promoting target gene expression in specific cell types. TF binding data are available through the ENCODE Data Coordinating Center and at https://epic.gs.washington.edu/modERNresource, which provides access to processed and summary data, as well as widgets to probe cell type-specific TF-target relationships. These data are a rich resource that should fuel investigations into TF function during development.

8.
Nucleic Acids Res ; 52(4): e20, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38214231

RESUMO

Numerous statistical methods have emerged for inferring DNA motifs for transcription factors (TFs) from genomic regions. However, the process of selecting informative regions for motif inference remains understudied. Current approaches select regions with strong ChIP-seq signal for a given TF, assuming that such strong signal primarily results from specific interactions between the TF and its motif. Additionally, these selection approaches do not account for non-target motifs, i.e. motifs of other TFs; they presume the occurrence of these non-target motifs infrequent compared to that of the target motif, and thus assume these have minimal interference with the identification of the target. Leveraging extensive ChIP-seq datasets, we introduced the concept of TF signal 'crowdedness', referred to as C-score, for each genomic region. The C-score helps in highlighting TF signals arising from non-specific interactions. Moreover, by considering the C-score (and adjusting for the length of genomic regions), we can effectively mitigate interference of non-target motifs. Using these tools, we find that in many instances, strong ChIP-seq signal stems mainly from non-specific interactions, and the occurrence of non-target motifs significantly impacts the accurate inference of the target motif. Prioritizing genomic regions with reduced crowdedness and short length markedly improves motif inference. This 'less-is-more' effect suggests that ChIP-seq region selection warrants more attention.


Assuntos
Genômica , Motivos de Nucleotídeos , Fatores de Transcrição , Sítios de Ligação , Imunoprecipitação da Cromatina , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
9.
Genome Res ; 2023 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-38097386

RESUMO

Single nucleotide polymorphisms (SNPs) from omics data create a reidentification risk for individuals and their relatives. Although the ability of thousands of SNPs (especially rare ones) to identify individuals has been repeatedly shown, the availability of small sets of noisy genotypes, from environmental DNA samples or functional genomics data, motivated us to quantify their informativeness. We present a computational tool suite, termed Privacy Leakage by Inference across Genotypic HMM Trajectories (PLIGHT), using population-genetics-based hidden Markov models (HMMs) of recombination and mutation to find piecewise alignment of small, noisy SNP sets to reference haplotype databases. We explore cases in which query individuals are either known to be in the database, or not, and consider several genotype queries, including those from environmental sample swabs from known individuals and from simulated "mosaics" (two-individual composites). Using PLIGHT on a database with ∼5000 haplotypes, we find for common, noise-free SNPs that only ten are sufficient to identify individuals, ∼20 can identify both components in two-individual mosaics, and 20-30 can identify first-order relatives. Using noisy environmental-sample-derived SNPs, PLIGHT identifies individuals in a database using ∼30 SNPs. Even when the individuals are not in the database, local genotype matches allow for some phenotypic information leakage based on coarse-grained SNP imputation. Finally, by quantifying privacy leakage from sparse SNP sets, PLIGHT helps determine the value of selectively sanitizing released SNPs without explicit assumptions about population membership or allele frequency. To make this practical, we provide a sanitization tool to remove the most identifying SNPs from genomic data.

10.
Science ; 381(6663): 1162, 2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37708273

RESUMO

Society's obsession with optimization has a cost, argues a mathematical modeler.

11.
Sci Adv ; 9(29): eadf4163, 2023 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-37467337

RESUMO

Aging is a leading risk factor for cancer. While it is proposed that age-related accumulation of somatic mutations drives this relationship, it is likely not the full story. We show that aging and cancer share a common epigenetic replication signature, which we modeled using DNA methylation from extensively passaged immortalized human cells in vitro and tested on clinical tissues. This signature, termed CellDRIFT, increased with age across multiple tissues, distinguished tumor from normal tissue, was escalated in normal breast tissue from cancer patients, and was transiently reset upon reprogramming. In addition, within-person tissue differences were correlated with predicted lifetime tissue-specific stem cell divisions and tissue-specific cancer risk. Our findings suggest that age-related replication may drive epigenetic changes in cells and could push them toward a more tumorigenic state.


Assuntos
Epigenoma , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/patologia , Epigênese Genética , Envelhecimento/genética , Fatores de Risco
12.
PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37410793

RESUMO

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.


Assuntos
COVID-19 , MicroRNAs , Humanos , SARS-CoV-2/genética , Síndrome Pós-COVID-19 Aguda , Pandemias/prevenção & controle , MicroRNAs/genética
13.
bioRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292896

RESUMO

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

14.
Mol Cell ; 83(12): 1983-2002.e11, 2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37295433

RESUMO

The evolutionarily conserved minor spliceosome (MiS) is required for protein expression of ∼714 minor intron-containing genes (MIGs) crucial for cell-cycle regulation, DNA repair, and MAP-kinase signaling. We explored the role of MIGs and MiS in cancer, taking prostate cancer (PCa) as an exemplar. Both androgen receptor signaling and elevated levels of U6atac, a MiS small nuclear RNA, regulate MiS activity, which is highest in advanced metastatic PCa. siU6atac-mediated MiS inhibition in PCa in vitro model systems resulted in aberrant minor intron splicing leading to cell-cycle G1 arrest. Small interfering RNA knocking down U6atac was ∼50% more efficient in lowering tumor burden in models of advanced therapy-resistant PCa compared with standard antiandrogen therapy. In lethal PCa, siU6atac disrupted the splicing of a crucial lineage dependency factor, the RE1-silencing factor (REST). Taken together, we have nominated MiS as a vulnerability for lethal PCa and potentially other cancers.


Assuntos
Neoplasias de Próstata Resistentes à Castração , Neoplasias da Próstata , Masculino , Humanos , Íntrons/genética , Neoplasias da Próstata/metabolismo , Splicing de RNA/genética , Spliceossomos/metabolismo , Transdução de Sinais , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Linhagem Celular Tumoral , Neoplasias de Próstata Resistentes à Castração/genética
15.
Science ; 380(6645): 589, 2023 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-37167370

RESUMO

A quantum computing primer offers insights into the technology's most promising potential applications.

16.
Sleep Med ; 107: 212-218, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37235891

RESUMO

Public health officials and clinicians routinely advise social media users to avoid nighttime social media use due to the perception that this delays the onset of sleep and predisposes to the health risks of insufficient sleep. With some exceptions, the evidence behind this advice mostly derives from surveys identifying an association between self-reported social media usage and self-reported sleep patterns. In principle, these associations could alternatively be explained by users turning to social media to pass the time when they are otherwise having difficulty sleeping, or by individual differences that draw some people to frequent social media use, or by offline activities that overlap with both social media use and delayed sleep. To attempt to distinguish among these explanations, we leveraged estimated bedtimes from 44,000 Reddit users reported in a recent study and their 120 million posts to test whether the relationship between sleep and social media has properties suggestive of a causal relationship. We find that users are especially likely to be active on Reddit after their bedtime (and therefore awake) on nights that they posted to Reddit shortly before bedtime, especially if they posted multiple times or in high-engagement forums that night. Overall, this study lends additional support to the notion that there likely is some causal effect of evening social media use on delayed sleep onset.


Assuntos
Transtornos do Sono do Ritmo Circadiano , Mídias Sociais , Adulto , Feminino , Humanos , Masculino , Adulto Jovem , Ritmo Circadiano , Prevalência , Autorrelato , Transtornos do Sono do Ritmo Circadiano/epidemiologia , Fatores de Tempo
17.
Cell Genom ; 3(5): 100303, 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37228754

RESUMO

Although the role of RNA binding proteins (RBPs) in extracellular RNA (exRNA) biology is well established, their exRNA cargo and distribution across biofluids are largely unknown. To address this gap, we extend the exRNA Atlas resource by mapping exRNAs carried by extracellular RBPs (exRBPs). This map was developed through an integrative analysis of ENCODE enhanced crosslinking and immunoprecipitation (eCLIP) data (150 RBPs) and human exRNA profiles (6,930 samples). Computational analysis and experimental validation identified exRBPs in plasma, serum, saliva, urine, cerebrospinal fluid, and cell-culture-conditioned medium. exRBPs carry exRNA transcripts from small non-coding RNA biotypes, including microRNA (miRNA), piRNA, tRNA, small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), Y RNA, and lncRNA, as well as protein-coding mRNA fragments. Computational deconvolution of exRBP RNA cargo reveals associations of exRBPs with extracellular vesicles, lipoproteins, and ribonucleoproteins across human biofluids. Overall, we mapped the distribution of exRBPs across human biofluids, presenting a resource for the community.

18.
Sci Rep ; 13(1): 8470, 2023 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-37231011

RESUMO

For the COVID-19 pandemic, viral transmission has been documented in many historical and geographical contexts. Nevertheless, few studies have explicitly modeled the spatiotemporal flow based on genetic sequences, to develop mitigation strategies. Additionally, thousands of SARS-CoV-2 genomes have been sequenced with associated records, potentially providing a rich source for such spatiotemporal analysis, an unprecedented amount during a single outbreak. Here, in a case study of seven states, we model the first wave of the outbreak by determining regional connectivity from phylogenetic sequence information (i.e. "genetic connectivity"), in addition to traditional epidemiologic and demographic parameters. Our study shows nearly all of the initial outbreak can be traced to a few lineages, rather than disconnected outbreaks, indicative of a mostly continuous initial viral flow. While the geographic distance from hotspots is initially important in the modeling, genetic connectivity becomes increasingly significant later in the first wave. Moreover, our model predicts that isolated local strategies (e.g. relying on herd immunity) can negatively impact neighboring regions, suggesting more efficient mitigation is possible with unified, cross-border interventions. Finally, our results suggest that a few targeted interventions based on connectivity can have an effect similar to that of an overall lockdown. They also suggest that while successful lockdowns are very effective in mitigating an outbreak, less disciplined lockdowns quickly decrease in effectiveness. Our study provides a framework for combining phylodynamic and computational methods to identify targeted interventions.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2/genética , Pandemias/prevenção & controle , Filogenia , Controle de Doenças Transmissíveis/métodos , Surtos de Doenças
19.
medRxiv ; 2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36945630

RESUMO

Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...